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Abstract 

Accurate real-time tracking of influenza outbreaks helps public health ofhcials make timely 
and meaningful decisions that could save lives. We propose an influenza tracking model, ARGO 
(AutoRegression with GOogle search data), that uses publicly available online search data. In 
addition to having a rigorous statistical foundation, ARGO outperforms all previously available 
Google-searchbased tracking models, including the latest version of Google Flu Trends, even 
though it uses only low-quality search data as input from publicly available Google Trends and 
Google Gorrelate websites. ARGO not only incorporates the seasonality in influenza epidemics 
but also captures changes in peoples online search behavior over time. ARGO is also flexible, 
self-correcting, robust, and scalable, making it a potentially powerful tool that can be used for 
real-time tracking of other social events at multiple temporal and spatial resolutions. 

This is the preprint of the paper published at PNAS: dx.doi.org/10.1073/pnas.1515373112. 

There are some minor differences between this preprint and the published paper. 

Big data sets are constantly generated nowadays as the activities of millions of users are col¬ 
lected from internet-based services. Numerous studies have suggested great potential of these big 
data sets to detect/manage epidemic outbreaks (influenza [H El EJ HJ El |6] , Ebola [7j , dengue [8] ), 
predict changes in stock prices [9l [TO] and housing prices mi, etc. In 2009, Google Flu Trends 
(GET), a digital disease detection system that uses the volume of selected Google search terms to 
estimate current influenza-like illnesses (ILI) activity, was identified by many as a good example of 
how big data would transform traditional statistical predictive analysis jT^]. However, significant 
discrepancies between GFT’s flu estimates and those measured by the Centers for Disease Control 
(GDC) in subsequent years led to considerable doubt about the value of digital disease detection 
systems |13) . While multiple articles have identihed methodological flaws in GFT’s original algo¬ 
rithm [HllSlllel and have led to incremental improvements mi EH EH, a statistical framework 
that is theoretically sound and capable of accurate estimation is still lacking. Here we present 
such a framework that culminates in a new method that outperforms all existing methodologies for 
tracking influenza activity using internet search data. 
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Influenza outbreaks cause up to 500,000 deaths a year worldwide, and an estimated 3,000 to 
50,000 deaths a year in the USA [18]. Our ability to effectively prepare for and respond to these 
outbreaks heavily relies on the availability of accurate real-time estimates of their activity. Existing 
methods to predict the timing, duration and magnitude of flu outbreaks remain limited [19] . Well- 
established clinical methods to track flu activity, such as the CDC’s ILINet, report the percentage 
of patients seeking medical attention with ILI symptoms (www.cdc.gov/flu/). While CDC’s %ILI 
is only a proxy of the flu activity in the population, it can help officials allocate resources in 
preparation for potential surges of patient visits to hospital facilities. See for further 

discussion. 

CDC’s ILI reports have a delay of one to three weeks due to the time for processing and 
aggregating clinical information. This time lag is far from optimal for decision-making purposes. 
In order to alleviate this information gap, multiple methods combining climate, demographic and 
epidemiological data with mathematical models have been proposed for real-time estimation of flu 
activity |19[ [22l [23l 1241 [25l I26j . In recent years, methods that harness internet-based information 
have also been proposed, such as Google [I], Yahoo [2], and Baidu [3] internet searches, Twitter 
posts [1], Wikipedia article views [5], clinicians’ queries [6], and crowd sourced self-reporting mobile 
apps such as Influenzanet (Europe) [57], Flutracking (Australia) [55], and Flu Near You (USA) 
|29| . Among them, GET has received most attention and has inspired subsequent digital disease 
detection systems [SEiisniEiiEiiias]. Interestingly, Google has never made their raw data public, 
thus, making it impossible to reproduce the exact results of GET. 

We highlight three limitations of the original GET algorithm, previously identified in |15L I16j . 
First, it was shown that a static approach, which does not take advantage of newly available GDG’s 
ILI activity reports as the flu season evolves, produced model drift, leading to inaccurate estimates. 
Second, the idea of aggregating the multiple query terms (the independent variables in the GET 
model) into a single variable did not allow for changes in people’s internet search behavior over time 
(and thus changes in query terms’ abilities to track flu) to be appropriately captured. Third, GET 
ignored the intrinsic time series properties, such as seasonality of the historical ILI activity, thus 
overlooking potentially crucial information that could help produce accurate real time ILI activity 
estimates. 

0.1 Our contribution 

The new methodology presented here produces robust and highly accurate ILI activity level es¬ 
timates by addressing the three aforementioned shortcomings of the multiple GET engines. In 
addition, we provide a theoretical framework that, for the first time, justifies the prevailing usage 
of linear models in the digital disease detection literature by incorporating causality arguments 
through a hidden Markov model. This theoretical framework contains as a special case the model 
developed in |16] . Our new model not only achieves the goal of (a) dynamically incorporating 
new information from GDG reports as they become available and (b) automatically selecting the 
most useful Google search queries for estimation as in [T6j, but also largely improves estimation 
by (c) including the long-term cyclic information (seasonality) from past flu seasons on record as 
input variables, and (d) using a two-year moving window (which immediately precedes the desired 
date of estimation) for the training period to capture the most recent changes in people’s search 
patterns and time series behavior [51] . Our methodology efficiently builds a prediction model from 
individual search frequency as well as the past records of ILI activity. It utilizes both sources of 
information more efficiently than simply combining GET with autoregressive terms as suggested 
in |15) . since GET is not optimally aggregated to provide additional information on top of time 
series information. Furthermore, we provide a quantitative efficiency metric that measures the sta- 
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tistical significance of the improvement of our methodology over other alternatives. For example, 
our method is twice as accurate as the method that combines GFT with autoregressive terms (see 
Table . Finally, even though we use as input only the publicly available, low-quality data from 
the Google Correlate and Google Trends websites, our method has significant improvement over 
the latest version of GFT. 

We name our model ARGO, which stands for AutoRegression with GOogle search data. Sta¬ 
tistically speaking, ARGO is an autoregressive model with Google search queries as exogenous 
variables; ARGO also employs Li (and potentially L 2 ) regularization in order to achieve automatic 
selection of the most relevant information. 

1 Results 

Retrospective estimates of influenza activity (ILI activity level, as reported by the GDC) were 
produced using our model, ARGO, for the time period of 2009-03-29 to 2015-07-11, assuming we had 
access only to the historical CDC’s ILI reports up to the previous week of estimation. We compared 
ARGO’s estimates with the ground truth: the CDC-reported weighted ILI activity level, published 
typically with one or two weeks delay, by calculating a collection of accuracy metrics described 
in the materials section. These metrics include the Root Mean Squared Error (RMSE), Mean 
Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Correlation with estimation 
target, and Correlation of increment with estimation target. For comparison, we calculated these 
accuracy metrics for (a) GFT estimates (accessed on 2015-07-11), (b) estimates produced using the 
method of Santillana et al. 2014 mm, (c) estimates produced by combining GFT with an AR(3) 
autoregressive model [l5], (d) estimates produced with an AR(3) autoregressive model |H[l5], and 
(e) a naive method that simply uses the value of the prior week’s CDC’s ILI activity level as the 
estimate for the current one. For fair comparison, all benchmark models (b - d) are dynamically 
trained with a two-year moving window. 

Table 1 summarizes these accuracy metrics for all estimation methods for multiple time periods. 
The hrst column shows that ARGO’s estimates outperform all other alternatives, in every accuracy 
metric for the whole time period. The other columns of Table 1 show the performance of all the 
methods for the 2009 off-season HlNl flu outbreak, and each regular flu season since 2010. The 
panels of Figure [^display the estimates against the observed CDC-reported ILI activity level. 

Close inspection shows that, in the post-2009 regular flu seasons, ARGO uniformly outperformed 
all other alternative estimation methods in terms of root mean squared error, mean absolute error, 
mean absolute percentage error, and correlation. ARGO avoids the notorious over-shooting problem 
of GFT, as seen in Figure[2 During the 2009 off-season HlNl flu outbreak, ARGO had the smallest 
mean absolute percentage error. In terms of root mean squared error and mean absolute error, 
ARGO (relative RMSE = 0.640, relative MAE = 0.584) had the second best performance, under- 
performing slightly only to GFT-|-AR(3) model (relative RMSE = 0.580, relative MAE = 0.570). 
In terms of correlation, ARGO (r=98.5%) had similar performance to (the potentially in-sample 
data of) GFT (r=98.9%) [2] and GFT-|-AR(3) model (r=98.6%), while outperforming all the other 
alternatives. 

To assess the statistical significance of the improved prediction power of ARGO, we constructed 
a 95% Confidence Interval for the relative efficiency of ARGO compared to other benchmark meth¬ 
ods. The Relative Efficiency of method 1 to method 2 is the ratio of the true Mean Squared Error 
of method 2 to that of method 1 |35|, which can be estimated by its observed value (see eq (|^); its 
confidence interval can be constructed by stationary bootstrap of the error residual time series |36j . 
Table shows that ARGO is estimated to be at least twice as efficient as any other alternative and 
the improvement in accuracy is highly statistically significant. 
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It is well-known that CDC reports undergo revisions, weeks after their initial publication, 
that respond to internal consistency checks and lead to more accurate estimates of patients with 
ILI symptoms seeking medical attention. Thus, the available historical CDC information, in a 
given week, is not necessarily as accurate as it will be. We tested the effect of using (potentially 
inaccurate) unrevised information by obtaining the historical unrevised and revised reports, and 
the dates when the reports were revised, from the CDC website for the time period of our study. 
We used only the information that would have been available to us, at the time of estimation, 
and produced a time series of estimates for the whole time period described before. We compared 
our estimates to all other methods and found that ARGO still outperformed them all. Moreover, 
the values of all five accuracy metrics for ARGO essentially did not change, suggesting a desirable 
robustness to revisions in GDC’s ILI activity reports. The results are shown in Table SI in the 
Supporting Information. 

We faced an additional challenge in producing real-time estimates for the latest portion of the 
2014-2015 flu season. At the time of writing this article, the only data available to us for the 
week of March 28, 2015 and later came from the Google Trends website. The information from 
Google Trends has even lower quality than from Google Gorrelate and changes every week. These 
undesired changes affected the quality of our estimates. In order to assess the stability of ARGO 
in the presence of these variations in the data, we obtained the search frequencies of the same 
query terms from Google Trends website on 25 different days during the month of April 2015, 
and produced a set of 25 historical estimates using ARGO. The results of the accuracy metrics 
associated to these estimates are shown in Table S2 in the Supporting Information. This table 
shows that, despite the observed variation in the Google Trends data, ARGO is threefold more 
stable than the method of m, and still outperforms on average any other method. 

2 Discussion 

2.1 Strength of ARGO 

The results presented here demonstrate the superiority of our approach both in terms of accuracy 
and robustness, when compared to all existing flu tracking models based on Google searches. The 
value of these results is even higher given the fact that they were produced with low quality input 
variables. It is highly likely that our methodology would lead to even more accurate results if we 
were given access to the input variables that Google uses to calculate their estimates. 

The combination of seasonal flu information with dynamic reweighting of search information, 
appears to be a key factor in the enhanced accuracy of ARGO. The level of ILI activity last week 
typically has a significant effect on the current level of ILI activity, and ILI activity half a year ago 
and/or one year ago could provide further information, as shown in Figure SI of the Supporting 
Information, which reflects a strong temporal auto-correlation. The integration of time series infor¬ 
mation leads to a smooth and continuous estimation curve and prevents undesired spikes. However, 
simply adding GFT to an autoregressive model is suboptimal compared to ARGO, because simply 
treating GFT as an individual variable is incapable to adjust for time series information at the res¬ 
olution of individual query terms, and many terms included in GFT may no longer provide extra 
information once time series information is incorporated. In fact, once the time series information 
is included, fewer Google search query terms remain significant. For example, among 100 Google 
Gorrelate query terms, ARGO selected 14 terms on average each week, whereas the method of 
m and GFT [1] selected 38 and 45 terms each week on average, respectively. The combination 
of ARGO’s smoothness and sparsity lead to a substantial reduction on the estimation error, as 
observed in Tables 1 and 2, where ARGO shows improved performance in all evaluation metrics 
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Figure 1: Estimation results. The top panel shows the estimated ILI activity level from ARGO 
(thick red), contrasting to the true CDC’s ILI activity level (thick black) as well as the estimates 
from GET (green), method of [16] (blue), GET plus AR(3) model (dark yellow) and AR(3) model 
(dashed grey). The two background shades, white and yellow, reflect two data sources, Google 
Correlate and Google Trends, respectively. The dashed yellow vertical line separates Google Corre¬ 
late data with search terms identified on 2009-03-28 and 2010-05-22. The second panel shows the 
estimation error, defined as estimated value minus the CDC’s ILI activity level. The small panels 
labeled in alphabetical order are zoomed-in plots for estimation results in different study periods. 
Panel (a) is the HlNl flu outbreak period. Panel (b) is the 2012-13 regular flu season. Panel (c) is 
the 2014-15 regular flu season. A regular flu season is defined as week 40 of one year to week 20 of 
the following year. 
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Whole period 

Off-season flu 

Regular flu seasons 

(week 40 to week 20 next year) 




HlNl 

2010-11 

2011-12 

2012-13 

2013-14 

2014-15 


ARGO 

0.608 

0.640 

0.596 

0.807 

0.687 

0.306 

0.438 


GFT (Oct 2014) 

2.216 

0.773 

1.110 

3.023 

4.451 

0.986 

0.700 

cn 

Santillana et al. (2014) 

0.915 

0.833 

0.881 

2.027 

1.090 

0.446 

0.663 


GFT+AR(3) 

0.912 

0.580 

0.602 

1.382 

1.279 

0.993 

0.906 


AR(3) 

0.957 

0.813 

0.794 

1.051 

1.191 

0.969 

0.928 


Naive 

1 (0.348) 

1 (0.600) 

1 (0.339) 

1 (0.163) 

1 (0.499) 

1 (0.350) 

1 (0.465) 


ARGO 

0.649 

0.584 

0.574 

0.748 

0.650 

0.391 

0.530 


GFT (Oct 2014) 

1.834 

0.777 

1.260 

3.277 

5.028 

0.891 

0.770 


Santillana et al. (2014) 

1.052 

0.719 

1.010 

2.211 

1.029 

0.610 

0.820 

i 

GFT+AR(3) 

0.888 

0.570 

0.613 

1.308 

1.016 

1.034 

0.839 


AR(3) 

0.925 

0.777 

0.787 

0.951 

0.988 

0.917 

0.934 


Naive 

1 (0.201) 

1 (0.425) 

1 (0.259) 

1 (0.135) 

1 (0.325) 

1 (0.212) 

1 (0.295) 


ARGO 

0.787 

0.620 

0.663 

0.770 

0.719 

0.453 

0.620 


GFT (Oct 2014) 

1.937 

0.721 

1.394 

3.442 

5.419 

0.892 

0.895 


Santillana et al. (2014) 

1.381 

0.765 

1.380 

2.306 

1.251 

0.754 

0.958 

< 

GFT+AR(3) 

1.037 

0.683 

0.698 

1.407 

0.986 

1.062 

0.828 


AR(3) 

1.003 

0.894 

0.814 

0.947 

0.939 

0.891 

0.916 


Naive 

1 (0.090) 

1 (0.139) 

1 (0.105) 

1 (0.081) 

1 (0.110) 

1 (0.084) 

1 (0.097) 


ARGO 

0.986 

0.985 

0.989 

0.928 

0.968 

0.993 

0.993 

.2 

GFT (Oct 2014) 

0.875 

0.989 

0.968 

0.833 

0.926 

0.969 

0.986 


Santillana et al. (2014) 

0.971 

0.967 

0.983 

0.927 

0.956 

0.985 

0.984 

OP 

GFT+AR(3) 

0.967 

0.986 

0.985 

0.879 

0.929 

0.945 

0.957 

0 

AR(3) 

0.964 

0.968 

0.971 

0.877 

0.903 

0.927 

0.945 

u 

Naive 

0.961 

0.951 

0.954 

0.887 

0.924 

0.923 

0.937 


ARGO 

0.758 

0.806 

0.810 

0.286 

0.527 

0.938 

0.912 


GFT (Oct 2014) 

0.706 

0.863 

0.702 

0.484 

0.502 

0.847 

0.918 


Santillana et al. (2014) 

0.690 

0.776 

0.693 

0.510 

0.367 

0.915 

0.889 

h a 

0 i 

GFT+AR(3) 

0.512 

0.708 

0.708 

0.165 

0.141 

0.534 

0.587 

O c 

AR(3) 

0.385 

0.585 

0.569 

0.077 

0.011 

0.404 

0.493 


Naive 

0.436 

0.602 

0.570 

0.095 

0.134 

0.406 

0.514 


Table 1: Comparison of different models for the estimation of influenza epidemics. GFT+AR(3) 
stands for the model pt = ^ + aipt-i + Oi 2 Pt -2 + ot^Pt-^ + /3GFT(t), where the GFT estimate is 
treated as an exogenous variable. Boldface highlights the best performance for each metric in each 
study period. RMSE, MAE and MAPE are relative to the error of naive method; that is, the 
number reported is the ratio of error of a given method to that of the naive method. The absolute 
error of the naive method is reported in the parentheses. All comparisons are based on the original 
scale of ILI activity level. 


over the whole time period and is twice as efficient as GET+AR(3). 

Our methodology allows us to transparently understand how Google search information and 
historical flu information complement one another. Time series models tend to be slow in response 
to sudden observed changes in GDC’s ILI activity level. The AR(3) model shows this “delaying” 
effect, despite its seemingly good correlation. Google searches, on the other hand, are better at 
detecting sudden ILI activity changes, but are also very sensitive to public’s over-reaction. 

To investigate further the responsiveness (co-movement) of ARGO towards the change in ILI 
activity, we calculated the correlation of increment between each estimation model and GDG’s 
ILI activity level. The correlation of increment between two time series at and bt is defined as 
Corr(at — at-i,bt — bt-i), which measures how well at captures the changes in bt- Table 1 shows 
that ARGO has similar capability in capturing the changes in ILI level to that of GET and the 
method of m, while outperforming the time series model AR(3) uniformly. 

Time series information (seasonality) tends to pull ARGO’s estimate towards the historical level. 


6 















point estimate 

95% GI 

GET (Oct 2014) 

12.85 

[5.18, 91.82] 

Santillana et al. (2014) 

2.02 

[1.36, 2.83] 

GFT+AR(3) 

2.17 

[1.23, 4.53] 

AR(3) 

2.40 

[1.56, 3.69] 


Table 2: Estimate of Relative Efficiency of ARGO compared to other models with 95% Conhdence 
Interval (Cl). Relative Efficiency being larger than one suggests increased predictive power of ARGO 
compared to the alternative method. 


This was evident at the onset of the off-season HlNl flu outbreak (week ending at 05/02/2009), 
which resulted in ARGO’s under-estimation. ARGO self-corrected its performance the following 
week by shifting a portion of model weights from the time series domain to the Google searches 
domain. Inversely, at the height of 2012-13 season, ARGO, GET and the method of [H] all 
missed the peak due to an unprecedented surge of search activity. ARGO achieved the fastest self¬ 
correction by redistributing the weights not only across Google terms but also across time series 
terms, missing the peak by only 1 week, as opposed to 2 weeks for |16j and about 4 weeks for GET. 
It is important to note that while we have used CDC’s ILI as our gold standard for influenza activity 
in the US population, and data from Google Correlate/Trends as our independent variables, our 
methodology can be immediately adapted to any other suitable ILI gold standard and/or set of 
independent variables. 

2.2 Limitations and next steps 

While ARGO displays a clear superiority over previous methods, it is not fail-proof. Since it relies 
on the public’s search behavior, any abrupt changes to the inner works of the search engine or any 
changes in the way health-related search information is displayed to users will affect the accuracy 
of our methodology [371 EH]. We expect that ARGO will be fast at correcting itself if any such 
change takes place in the future. As in any predictive method, the quality of past performance does 
not guarantee the quality of future performance. In this article, we fixed the search query terms 
after 2010 so as to directly compare our results with GET, which kept the same query terms since 
2010; future application of ARGO may update search terms more frequently. ARGO can be easily 
generalized to any temporal and spatial scales for a variety of diseases or social events amenable 
to be tracked by internet searches or services OiaiHlElEniEIllMlIini. Further improvements 
in influenza prediction may come from combining multiple predictors constructed from disparate 
data sources j45j . After the submission of this article, Google announced that GET would be 
discontinued and that their raw data would be made accessible to selected scientific teams. This 
announcement happened soon after the GET team published a manuscript that proposed a new 
time-series based method for the (now discontinued) GET engine |44) . This new development makes 
our contribution timely and useful in providing a transparent method for disease tracking in the 
future. 

3 Materials and Methods 

All data used in this article are publicly available. Therefore, IRB approval is not needed. 

3.1 Google Data 
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To avoid forward-looking information in our out-of-sample predictions, and to make the search 
term selection in our approach consistent with the main revision to GFT |14) immediately after the 
HlNl pandemic, we obtained the highest correlated terms to the CDC’s ILI using Google correlate 
(www.google.com/trends/correlate) for two different time periods. For the first time period (pre- 
HlNl period), we inserted only CDC’s ILI data from Jan 2004 to March 28, 2009 into Google 
Correlate, and used the resulting most highly correlated search terms as independent variables for 
our out-of-sample predictions for the time period April 4, 2009 - May 22, 2010. For the second 
time period (post-HlNl), we inserted only CDC’s ILI data from Jan 2004 to May 22, 2010 into 
Google Correlate to select new search terms as done in [T3]. These last search terms were used 
as independent variables for all subsequent predictions presented in this work. Tables S4 and S5 
in the Supporting Information show all query terms identihed. For the pre-HlNl period (the first 
time period), the terms from Google Correlate include spurious (or over-fitted) terms like “march 
vacation” or “basketball standings”, as discussed in [l5]. However, Figure SI in the Supporting 
Information shows that these spurious terms were often not selected by ARGO, i.e., ARGO would 
give them zero weights, demonstrating its robustness. For the post-HINl time period, the updated 
query terms from Google Correlate include mostly flu-related terms (see Table S5 in Supporting 
Information). This suggests that spurious terms were “filtered-out” by including off-season flu data. 
For the time period of March 28, 2015 up to the date of submission of this article, we acquired 
search frequencies for this set of query terms from Google Trends (www.google.com/trends. Date 
of access: July 11, 2015) as Google Correlate only provides data up to March 28, 2015 at the time 
of writing this article. 

Google Correlate standardizes the search volume of each query to have mean zero and standard 
deviation one across time and contains data only from 2004 to Mar 2015. To make Google Correlate 
data compatible with Google Trends data, we linearly transformed the Google Correlate data to the 
same scale of 0 to 100 in our analysis. We used Google Correlate data up to its last available date, 
and then switched to Google Trends data afterwards. This is indicated in Figure by different 
shades of the background. We used the latest version of Google Flu Trends (4th version, revised 
in Oct 2014) weekly estimates of ILI activity level as one of our comparison methods. GFT is 
available at www.google.org/flutrends/us/data.txt (Date of access: 2015-07-11). 

3.2 CDC’s data 

We use the weighted version of CDC’s ILI activity level as the estimation target (available at 
gis.cdc.gov/grasp/fluview/fluportaldashboard.html. Date of access: 2015-07-11). The weekly revi¬ 
sions of CDC’s ILI are available at the GDC website for all recorded seasons (from week 40 of a given 
year to week 20 of the subsequent year). For example, ILI report revision at week 50 of season 2012- 
13 is available at www.cdc.gov/flu/weekly/weeklyarchives2012-2013/data/senAllregt50.htm; ILI re¬ 
port revision at week 9 of season 2014-15 is available at www.cdc.gov/flu/weekly/weeklyarchives2014- 
2015/data/senAllregt09.html. 

3.3 Formulation of our model 

Our model ARGO is motivated by a hidden Markov model. The Zo^zt-transformed CDC-reported 
ILI activity level {yt} is the intrinsic time series of interest. We impose an autoregressive (AR) 
model with lag N on it, which implies that the collection of vectors is a Markov 

chain (this captures the clinical fact that flu lasts for a period, but not indefinitely). The vector of 
Zo( 7 -transformed normalized volume of Google search queries at time t, Xt, depends only on the ILI 
activity at the same time, yt (this follows the intuition that flu occurrence causes people to search 


flu related information online). The Markovian property on block leads to the (vector) 

hidden Markov model structure. 

Vl:N —^ y2:{N+l) ‘ ‘ ‘ y{T-N+l)-.T 

I I I (1) 

^N+l 


Our formal mathematical assumptions are: 

(1) yt = Hy + otjyt-j + et, et ~ AA(0, u^) 

(2) Xt\yt^ Nk (/^rr + ytl3, Q) 

(3) Conditional on yt, Xt is independent of {y/, Xi -.1 ^t} 

where (3 = (/3i, P 2 , ■ ■ ■, PkY, = {y^xi, y-x 2 , ■ ■ ■ 1 , and Q is the covariance matrix. To make 

the variables more normal, we transform the original ILI activity level pt from [0,1] to M using 
the logit function, obtaining the yt, and transform the Google search volumes from [0,100] to M 
using the log function, obtaining the Xt. The log function is appropriate because Google search 
frequencies usually have exponential growth rate near peaks and are artihcially scaled to [ 0 , 100 ] 
by dividing the running maximum. Since Google Trends is in integer scale from 0 to 100, we add 
a small number 6 = 0.5 before the transformation to avoid taking the log of 0. The predictive 


distribution / (jjt yi:(t-i), is normal with mean linear in yf^t-N):{t-i) and Xt and constant 

variance (see the Supporting Information). This observation leads to equation (§ below, which 
defines the ARGO model. 


3.4 The ARGO model 

Let yt = logit(pt) be the /oyit-transformed GDC’s (weighted) ILI activity level pt at time t, and 
Xi^t the /oy-transfomred Google search frequency of term i at time t. Our ARGO model is given 
by 

N K 

yt = yy + ‘^jyt-j + + e*, et ^ Af ( 0 , fj^), ( 2 ) 

j=l i=l 

where Xt can be thought as the exogenous variables to time series {yt}- 

3.5 Parameter estimation of ARGO model 

We chose N = 52 (weeks) to capture the within-year seasonality in ILI activity, and K = 100 
(Google search terms) following the data availability from Google Correlate. Since we have more 
independent variables than the number of observations, the usual maximum likelihood estimate 
(ordinary least squares) method will fail. Therefore, we impose regularities for parameter estima¬ 
tion. In general we have three kinds of penalties, Li penalty im, L 2 penalty [l 2 ], and a linear 
combination of Li and L 2 penalties |l3]. All parameters are dynamically trained every week with 
a 2-year (104 weeks) rolling window. 

In a given week, the goal is to find parameters py, a = (ai,..., 0 : 52 ), and /3 = (/3i,...,/3ioo) that 
minimize 

( 52 100 \ ^ 

yt — yy — (^jyt-j — Y^i,t | 

j=i i=i ) 

+ Actljo;]]! -|- yajjo:]]^ -|- A/3jj/3jji -|- yyll/^lli (3) 
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where Xa, ^i3,'na,Vi3 hyper-parameters. Ideally, we would like to use cross-validation to select 
all 4 hyper-parameters. However, since we have only 104 training data points at a given week 
due to the two-year moving window, the cross-validation result is highly noisy. Thus, we need 
to pre-specify some of the hyper-parameters. For model simplicity and sparsity, combining with 
the evidence seen from cross-validation, we set rja = rjp = 0, leading to Li penalization on both 
autoregressive and Google search terms. With the remaining Aq and A^, the cross-validation 
results still have considerable variance. By the same sparsity and simplicity consideration, we 
further constrained A^ = A^. Therefore, the ARGO model we finally propose is equation Q with 
constraint ??« = ??/? = 0 and Aq, = A^. A detailed discussion of our specification of the hyper¬ 
parameters is provided in the Supporting Information. 

3.6 Accuracy metrics 

The Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Per¬ 
centage Error (MAPE) of estimator p to the target ILI activity level p are defined, respectively, as 

RMSE{pt,pt) = Et=iiPt-Ptff\ MAEipt,pt) = ^ EHi \Pt-Pt\, MAEE{pt,pt) = ^ Et=i \Pt-Pt\/pt. 

The correlation of estimator p to the target ILI activity level p is their sample correlation coeffi¬ 
cient. The correlation of increment between pt and pt is defined as 
Corr. of increment(pi,pf) = Corr(pt — pt-i,pt — pt-i)- 

The Relative Efficiency of estimator to estimator is e{p^^\p^‘^^) = MSE^^^g/MSE^J^g, where 
MSE|:*|jg = ~ P)'^]i which can be estimated by 

e ^ where MSE^^ = I EHi - P^ ■ (4) 

The 95% Gonfidence Interval can be constructed by time series stationary bootstrap method [36] . 
where the replicated time series of the error residual is generated using geometrically distributed 
random blocks with mean length 52 (which corresponds to one year). We obtain the basic bootstrap 
confidence interval for log {e } and then recover the original scale by exponentiation. The 

non-parametric bootstrap confidence interval takes the autocorrelation and cross-correlation of the 
errors into account, and is insensitive to the mean block length. 
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Supporting Information 

A SI Methods and Robustness Analysis 

Details of our methodology are presented as follows. First, the predictive distribution in the 
formulation of the ARGO model and the corresponding assumptions are described; second, the 
statistical strategy to determine the hyper-parameters of the ARGO model is explained; third, the 
results of two sensitivity analysis aimed at testing the robustness of the ARGO methodology-(a) 
with respect to subsequent revisions of CDC’s ILI activity reports, and (b) with respect to observed 
variation of the input variables coming from Google Trends data-are presented; fourth, the exact 
search query terms identified by Google Gorrelate with different data access dates are presented; 
fifth, a heatmap showing the coefficients for the time series and Google search terms dynamically 
trained by ARGO is included. 


B Predictive distribution in the formulation of ARGO model 


To improve normality for both the input variables and the dependent variables, the CDG-reported 
ILI activity level was /o^it-transformed, and the linearly normalized volume of Google search queries 
were /o^-transformed. To avoid taking the log of 0, we add a small number 5 = 0.5 before the 
log-transformation. These transformations led to two sets of variables, the intrinsic (influenza 
epidemics activity) time series of interest {yt}-, and the (Google search) variable vector Xt at time 
t (that depends only on yt), respectively. Our formal mathematical assumptions are: 

1 - yt = yy + Y.f=i o^jVt-j + ~ A(0,0-2) 

2. Xt I 2/t ~ Mr (Rx + ytP, Q) 

3. Conditional on yt, Xt is independent of {yi, Xi : I ^ t} 


where /3 = {Pi, (32, ■ ■ ■, Pr)'^, fix = {fixi, Tx 2 t ■ ■ ^ I^xk)^, and Q is the covariance matrix. The 


predictive distribution / (yt+i yi.t, Xi:{t+i)^ is given by 


/I Mt + l + 






(^+0TQ-l0)- 


(5) 


which is a normal distribution, whose mean is a linear combination of y(^t-N):{t-i) and Xt, and 
whose variance is a constant. 


C Determination of the hyper—parameters for ARGO 

The optimized parameters of the ARGO model, fiy, a = {ai, ...,aR), (3 = {Pi, ...,Pr) are obtained 
by 
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2 


( 52 100 

yt- c^jyt-j - X] 

j=l i=l 

+ Aq||q:||i + i^allQ^lli + -^/ill/^lli + ^/^ll/^lli- (6) 

The training period consists of a two-year (104 weeks) rolling window that immediately precedes 
the desired date of estimation. The hyper-parameters are Aq,, r/c, We tested the performance 
of ARGO with the following specifications of hyper-parameters: 

1. Restrict rja = rjp = Q and Aq, = \p, cross validate on Aq. This is our proposed ARGO with 
the same Li penalty for Google search terms and autoregressive lags. 

2 . Restrict rja = = 0, cross validate on (Aq,,A/ 3 ). This is ARGO with separate Li penalties 

for Google search terms and autoregressive lags. 

3. Restrict Tja = iliB Aq = A^ = 0, cross validate on rja. This is ARGO with the same L 2 
penalty for Google search terms and autoregressive lags. 

4. Restrict Aq, = A /3 = 0, cross validate on (? 7 Q,r/^). This is ARGO with separate L 2 penalties 
for Google search terms and autoregressive lags. 

5. Restrict Aq = Xp,r]a = rjp, cross validate on (AqjT/q). This is ARGO with the same elastic 
net (both Li and L 2 ) penalty for Google search terms and autoregressive lags. 

Table summarizes the in-sample estimation performance for our proposed ARGO, together with 
the other specifications of hyper-parameters. It is apparent from the table that the Li penalty 
generally outperforms L 2 penalty. The Li penalty tends to shrink the coefficients of unnecessary 
independent variables to be exactly zero, and thus eliminates redundant information; on the other 
hand, the L 2 penalty can only shrink the coefficients to be close to zero. As a result, L 2 penalized 
coefficients are not as sparse as their Li counterparts. Furthermore, from Table we see that 
ARGO with separate Li penalties (Specification]^ outperforms ARGO with separate L 2 penalties 
(Specification]^, in terms of both root mean squared error and mean absolute error. Similarly, 
ARGO with the same Li penalty (Specification]^ outperforms ARGO with the same L 2 penalty 
(Specification]^, in terms of both root mean squared error and mean absolute error. 

The elastic net model, which combines Li penalty and L 2 penalty, does not provide any error 
reduction. In the cross-validation process of setting (Aq,? 7 q) for the elastic net model, 70 weeks 
out of 116 in-sample weeks showed that the smallest cross-validation mean error when restricting 
r]a = 0 (i.e. zero L 2 penalty) is within one standard deviation of the global smallest cross-validation 
mean error, suggesting that restricting L 2 penalty term to be zero (i.e. t/q = 0 ) will introduce little 
bias. Therefore, for the simplicity and sparsity of the model, we drop the L 2 penalty terms and 
use only Li penalty. 

Next we want to decide between the remaining two specifications, ARGO with separate Li 
penalties (Specification]^, and ARGO with the same Li penalty (Specification]^. One might 
argue that Google search terms and autoregressive lags are different sources of information and 
thus should have different Li penalties. However, empirical evidence in Table ]^ shows that, again, 
giving extra flexibility to {Xa,Xp) does not generate improvement compared to fixing Aq = A/ 3 . 
In the cross-validation process of setting (Aq,A^) for separate Li penalties, 99 weeks out of 116 
in-sample weeks showed that the smallest cross-validation mean error when restricting Xa = Xp 
(i.e. same Li penalty) is within one standard deviation of the global smallest cross-validation mean 
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error. This may well be due to the gain from variance reduction when imposing the restriction 
Aa = A/ 3 . Based on the same simplicity and sparsity consideration, we finally decided to restrict 

= ^/3 = 0 and Aq, = in the setting of hyper-parameters for ARGO. 

D Revision of CDC’s ILI activity reports 

Within a flu season, CDC reports are constantly revised to improve their accuracy as new informa¬ 
tion is incorporated. Thus, CDC’s weighted ILI figures displayed in previously published reports 
may change in subsequent weeks. As a consequence, in a given week the available CDC ILI infor¬ 
mation from the most recent weeks may be inaccurate. To test the robustness of ARGO in the 
presence of these revisions and mimic the real-time tracking in our retrospective predictions, we 
trained ARGO and all other alternative models based on the following schedule. 

Suppose Zi^j is the CDC-reported ILI activity level of week i accessed at week j. Since CDC’s 
ILI activity report is typically delayed for one week, on week j the historical ILI activity level data 
we have is {zi^j '■ i < j — 1}. Due to revisions, ILI activity level of week i accessed at different weeks 
Zi^i+i, Zi^ij^ 2 , ■ ■ ■ may be different but will converge to a finalized value Zi^oo eventually. Hence, to 
avoid using forward-looking information, in week j, we train all models with the ILI activity level 
accessed at that week {zij '■ i < j — !}■ In this sense, any future revision beyond week j will not be 
incorporated in the training at week j. Yet for the accuracy metrics, the estimation target remains 
the finalized the ILI activity level {zi^oo, i = 1,2,.. .). 

Tablej^shows the estimation results when using the aforementioned schedule. Note that ARGO 
still outperforms all other alternative models. Moreover, the absolute values of all four accuracy 
metrics for ARGO trained this way essentially do not change compared to ARGO trained with 
finalized ILI activity level in the main text, indicating the robustness of ARGO. 

The weekly revisions of CDG’s ILI activity reports are available at CDG website from week 40 of 
the year to week 20 of the subsequent year for all seasons studied in this article. For example, ILI ac¬ 
tivity level revisions at week 50 of season 2012-2013 are available at http: //www.cdc.gov/flu/weekly/ 
weeklyarchives2012- 2013/data/senAllregt50.htm; ILI activity report revision at week 9 of season 
2014-2015 is available at http;//www.cdc.gov /flu/weekly/weeklyarchives2014-2015/data/senAllregt09.html 
(the webpage has suffix “htm” for seasons before 2014-2015 and suffix “html” for 2014-2015 season). 

In this retrospective case study, when the revisions of ILI activity level were not available for a 
particular week during off-season period, the finalized ILI activity level was used instead. 

E Variations of Google Trends data 

Google Trends historical data constantly change as a consequence of re-normalizations and al¬ 
gorithm updates. To study the robustness of ARGO to Google Trends data revisions, we ob¬ 
tained the search frequencies of the search query terms identified by Google Gorrelate on May 
22, 2010 (see Figure 2 in the main text and Table below) from the Google Trends website 
(http://www.google.com/trends) on 25 different days in April 2015. We studied the variability 
of ARGO’s performance when using these 25 different versions of Google Trends data as input 
variables for the common time period of Sep 28, 2014 to Mar 29, 2015. We studied the 2014-15 flu 
season only partially (up to March 2015) because this is the longest study period covered by all 
the obtained versions of Google Trends data, at the time (May 1, 2015) of the first submission of 
this article. We want to emphasize that Google Correlate data were only available up to Feb 2014 
when accessed in April 2015. 

Despite the inevitable variation to the revision of the low-quality data from Google Trends, 
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ARGO still achieves considerable stability compared to the method of Santillana et al. m during 
this time period. Table suggests that ARGO is threefold more robust than the method of |16j . 
The incorporation of time series information helps ARGO achieve the stability. As an extreme 
example, AR(3) model focuses entirely on the time series information and is thus independent of 
Google Trends data revisions. GFT, formulated with the original search variables as inputs, is by 
construction insensitive to the changes in Google Trends data. For this portion of the study, we 
included the signal from GFT for context only and we treat it as exogenous in our analysis. Based 
on the results from previous time periods, it is highly likely that if we had access to Google’s internal 
raw data (i.e., historical search volume for disease-related phrases) we would have achieved the same 
stability as well. Yet even with these low-quality data, ARGO outperforms GFT uniformly on all 
versions of data in terms of both root mean squared error and mean absolute error. 

F Detailed description of Google Correlate data 

Tables and list the search query phrases identified by Google Gorrelate as of March 28, 2009 
and of May 22, 2010, respectively. The March 2009 version included spurious terms such as “col¬ 
lege.basketball.standings”, “march.vacation”, “aloha.ski”, ‘Virginia.wrestling”, etc. These spurious 
terms did not appear in the May 2010 version. 

G Dynamic coefficients for ARGO 

Figure shows the coefficients for the time series and Google search terms dynamically trained by 
ARGO via a heatmap. The level of ILI activity last week is seen to have a significant effect on 
the current level of ILI activity, and ILI activity half a year ago and/or one year ago could provide 
further information as the figure shows. Among Google Gorrelate query terms, ARGO selected 14 
terms out of 100 on average each week. 
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RMSE 


Whole period 


03/29/09 

07/18/15_ 


Off-season flu 
HlNl 
'03/29/09 
VlITJjm 


Regular flu seasons (week 40 to week 20 next year) 
2010-11 2011-12 2012-13 2013-14 


'10/03/10' 


'10/02/11' 


'09/30/12' 


'09/29/13' 

05/22/ll_ 


05/20/12 


05/19/13 


05/18/14 


2014-15 

'09/28/14 

05/17/15 


ARGO 

0.565 

0.630 

0.509 

0.608 

0.622 

0.298 

0.434 

GFT (Oct 2014) 

2.003 

0.702 

0.971 

1.878 

4.387 

0.885 

0.714 

Santillana et al. (2014) 

0.897 

0.858 

0.760 

1.179 

1.248 

0.373 

0.691 

GFT+AR(3) 

0.825 

0.530 

0.616 

0.680 

1.168 

0.981 

0.898 

AR(3) 

0.963 

0.805 

0.986 

1.136 

1.087 

0.946 

0.931 

Naive 

1.000 (0.385) 

1.000 (0.661) 

1.000 (0.388) 

1.000 (0.263) 

1.000 (0.506) 

1.000 (0.391) 

1.000 (0.456) 

MAE 








ARGO 

0.557 

0.595 

0.483 

0.555 

0.627 

0.339 

0.501 

GFT (Oct 2014) 

1.465 

0.670 

1.093 

2.026 

5.082 

0.747 

0.787 

Santillana et al. (2014) 

0.865 

0.723 

0.875 

1.283 

1.087 

0.472 

0.847 

GFT+AR(3) 

0.790 

0.485 

0.672 

0.643 

1.000 

1.036 

0.890 

AR(3) 

0.999 

0.808 

0.982 

1.158 

1.094 

0.943 

0.920 

Naive 

1.000 (0.252) 

1.000 (0.494) 

1.000 (0.299) 

1.000 (0.218) 

1.000 (0.322) 

1.000 (0.253) 

1.000 (0.289) 

MAPE 








ARGO 

0.587 

0.587 

0.511 

0.560 

0.588 

0.350 

0.582 

GFT (Oct 2014) 

1.350 

0.603 

1.163 

2.163 

4.827 

0.688 

0.906 

Santillana et al. (2014) 

0.970 

0.709 

1.141 

1.363 

1.143 

0.545 

0.937 

GFT+AR(3) 

0.848 

0.599 

0.749 

0.669 

0.819 

1.068 

0.964 

AR(3) 

1.067 

0.915 

1.051 

1.169 

1.050 

0.945 

0.935 

Naive 

1.000 (0.129) 

1.000 (0.166) 

1.000 (0.126) 

1.000 (0.129) 

1.000 (0.123) 

1.000 (0.108) 

1.000 (0.095) 

Correlation 








ARGO 

0.985 

0.979 

0.988 

0.911 

0.971 

0.992 

0.992 

GFT (Oct 2014) 

0.875 

0.989 

0.968 

0.833 

0.926 

0.969 

0.986 

Santillana et al. (2014) 

0.965 

0.956 

0.985 

0.937 

0.938 

0.987 

0.973 

GFT+AR(3) 

0.971 

0.984 

0.983 

0.853 

0.931 

0.943 

0.960 

AR(3) 

0.961 

0.965 

0.955 

0.815 

0.921 

0.920 

0.953 

Naive 

0.956 

0.943 

0.946 

0.828 

0.928 

0.910 

0.945 

Corr. of increment 








ARGO 

0.742 

0.751 

0.772 

0.262 

0.633 

0.898 

0.892 

GFT (Oct 2014) 

0.706 

0.863 

0.702 

0.484 

0.502 

0.847 

0.918 

Santillana et al. (2014) 

0.625 

0.680 

0.719 

0.619 

0.293 

0.917 

0.837 

GFT+AR(3) 

0.536 

0.703 

0.703 

0.155 

0.220 

0.514 

0.621 

AR(3) 

0.420 

0.562 

0.554 

0.067 

0.106 

0.360 

0.549 

Naive 

0.455 

0.552 

0.556 

0.162 

0.247 

0.345 

0.586 


Table 3: Comparison of different models for the estimation of influenza epidemics, with weekly 
CDC’s ILI activity level that excludes forward-looking information from ILI activity report revision. 
The estimation target is the finalized CDC’s ILI activity level. RMSE, MAE and MAPE are relative 
to the error of naive method. The absolute error of the naive method is reported in the parentheses. 
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RMSE 

MAE 

MAPE 

Correlation 

Corr. of increment 

Mean 






ARGO 

0.226 

0.304 

0.079 

0.981 

0.831 

GFT (Oct 2014) 

0.262 

0.366 

0.089 

0.985 

0.920 

Santillana et al. (2014) 

0.306 

0.398 

0.116 

0.973 

0.803 

GFT-hAR(3) 

0.303 

0.482 

0.090 

0.948 

0.581 

AR(3) 

0.332 

0.492 

0.096 

0.936 

0.492 

Standard Deviation 






ARGO 

0.013 

0.017 

0.005 

0.002 

0.016 

GFT (Oct 2014) 

0.000 

0.000 

0.000 

0.000 

0.000 

Santillana et al. (2014) 

0.029 

0.049 

0.013 

0.005 

0.050 

GFT-hAR(3) 

0.000 

0.000 

0.000 

0.000 

0.000 

AR(3) 

0.000 

0.000 

0.000 

0.000 

0.000 


Table 4; Mean and Standard Deviation of accuracy metrics when using Google Trends data accessed 
at different dates. The common study period is 2014-15 partial season (Sep 28, 2014 to Mar 29, 
2015). At the time of first submitting this article, Google Correlate data covered only upto Feb 
2014, which inspired us to study the robustness of ARGO with respect to Google Trends data 
variability on the 2014-15 season. 
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Whole in-sample period 
01/07/07-03/29/09 

2006-07 partial season 
01/07/07-05/20/07 

2007-08 season 
09/30/07-05/18/08 

2008-09 partial season 
09/28/08-03/29/09 

RMSE 

ARGO w/ same Li 

0.644 

0.697 

0.602 

0.653 

ARGO w/ sep. Li 

0.658 

0.672 

0.637 

0.629 

ARGO w/ same L 2 

1.165 

0.817 

1.175 

1.243 

ARGO w/ sep. L 2 

1.010 

0.740 

0.946 

1.173 

ARGO w/ ElasticNet 

0.669 

0.757 

0.585 

0.766 

Naive 

1.000 (0.316) 

1.000 (0.286) 

1.000 (0.473) 

1.000 (0.304) 

MAE 

ARGO w/ same Li 

0.678 

0.651 

0.584 

0.634 

ARGO w/ sep. Li 

0.691 

0.671 

0.621 

0.593 

ARGO w/ same L 2 

1.223 

0.836 

1.094 

1.469 

ARGO w/ sep. L 2 

1.149 

0.753 

0.943 

1.401 

ARGO w/ ElasticNet 

0.738 

0.718 

0.613 

0.780 

Naive 

1.000 (0.206) 

1.000 (0.245) 

1.000 (0.335) 

1.000 (0.226) 

Correlation 

ARGO w/ same Li 

0.987 

0.977 

0.983 

0.977 

ARGO w/ sep. Li 

0.986 

0.980 

0.980 

0.976 

ARGO w/ same L 2 

0.969 

0.984 

0.976 

0.955 

ARGO w/ sep. L 2 

0.979 

0.987 

0.983 

0.967 

ARGO w/ ElasticNet 

0.987 

0.984 

0.986 

0.975 

Naive 

0.965 

0.949 

0.950 

0.935 

Corr. of increment 

ARGO w/ same Li 

0.779 

0.643 

0.857 

0.646 

ARGO w/ sep. Li 

0.708 

0.545 

0.758 

0.697 

ARGO w/ same L 2 

0.828 

0.793 

0.864 

0.799 

ARGO w/ sep. L 2 

0.845 

0.795 

0.881 

0.824 

ARGO w/ ElasticNet 

0.814 

0.835 

0.852 

0.738 

Naive 

0.623 

0.473 

0.756 

0.322 


Table 5: Comparison of different specifications of hyper-parameters for in-sample study period. 
“ARGO w/ same Li” is ARGO with the same Li penalty for Google search terms and autoregressive 
lags (Specification [^. “ARGO w/ sep. Li” is ARGO with separate Li penalties for Google search 
terms and autoregressive lags (Specification!^. “ARGO w/ same L 2 ” is ARGO with the same L 2 
penalty for Google search terms and autoregressive lags (Specification]^. “ARGO w/ sep. L 2 ” is 
ARGO with separate L 2 penalties for Google search terms and autoregressive lags (Specification 
1^. “ARGO w/ ElasticNet” is ARGO with the same elastic net penalty for Google search terms 
and autoregressive lags (Specification]^. The first column is for the entire in-sample study period. 
The second column is for 2006-07 partial season. 2006-07 full season is not available because data 
prior to Jan 2007 is used for training. The third column is for 2007-08 full season. The fourth 
column is for 2008-09 partial season. 2008-09 full season is not available because our out-of-sample 
study period starts in Apr 2009. RMSE and MAE are relative to the error of naive method. The 
absolute error of the naive method is reported in the parentheses. 
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influenza.type.a 
flu.incubation 
bronchitis 

influenza.contagious 

flu. fever 

influenza, a 

influenza.incubation 

flu.contagious 

treating.the.flu 

type.a.influenza 

symptoms.of. the.flu 

influenza, symptoms 

flu.duration 

flu. report 

symptoms.of.flu 

influenza, incubation .period 

how. to.treat.the.flu 

treat.the.flu 

symptoms.of. bronchitis 

flu.treatment 

symptoms.of.influenza 

treating.flu 

flu.in.children 

fever, reducer 

cold.or.flu 


painful.cough 
fever, flu 

over.the.counter.flu 
pneumonia 
how.long.is.the.flu 
flu.how.long 
treatment. for. flu 
fever, cough 
flu.medicine 
dangerous.fever 
high, fever 
is. flu. contagious 
normal.body 

normal. body, temperature 

how .long.does.the.flu.last. 

symptoms.of. pneumonia 

signs.of.the.flu 

flu.vs.cold 

low.body 

cough, fever 

vegas. shows .march 

is.the.flu.contagious 

type.a.flu 

flu.treatments 

remedies.for.the.flu 


treatment.for.the.flu 
basketball.standing 
flu.test 
tussionex 
reduce.a.fever 

how. long. is. t he. flu. cont agious 
treat.flu 

spring.break.family 
las. vegas. shows .march 
how.to.reduce.a.fever 
flu. or. cold 

incubation.period.for. the.flu 

harlem.globe 

tussin 

basketball. standings 
sinus 

upper.respiratory 

get.over.the.flu 

acute.bronchitis 

body.temperature 

college.basketball.standings 

strep 

march, weather 
getting.over.the.flu 
march, vacation 


weather.march 
fevers 

duration.of.flu 
flu.contagious.period 
cold. vs. flu 
cure.the.flu 
walking.pneumonia 
flu.vs..cold 
length.of.flu 
influenza, a. and. b 
flu. and. pregnancy 
sinus.infections 
influenza.treatment 
jiminy.peak.ski 
baseball.preseason 
spring.break.date 
indoor.driving 
z.pack 

college. spring. break, dates 
aloha.ski 

concerts.in.march 
break, a. fever 
influenza.duration 
robitussin 
Virginia, wrestling 


Table 6: All search phrases identified by Google Correlate using data as of 2009-03-28. 
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influenza.type.a 
sy mpt oms. of. flu 
flu.duration 
flu.contagious 
flu. fever 
treat.the.flu 
how. to.treat.the.flu 
signs.of.the.flu 
over.the.counter.flu 
how .long.is.the.flu 
symptoms.of. the.flu 
flu. recovery 
cold.or.flu 
flu.medicine 
flu. or. cold 
normal.body 
is.flu.contagious 
treat, flu 

body, temper at ure 
is.the.flu.contagious 
reduce.fever 
flu.treatment 
flu.vs.cold 

how .long.is.the.flu.contagious 
fever, reducer 


get.over.the.flu 
treating.flu 
flu. vs..cold 
having.the.flu 
treatment. for. flu 
human, temperature 
dangerous. fever 
the.flu 

remedies.for.flu 

influenza.a.and.b 

contagious, flu 

how.long.does.the.flu.last 

fever, flu 

oscillococcinum 

flu. remedies 

how. long, is. flu. contagious 

flu.treatments 

influenza.symptoms 

cold. vs. flu 

braun .thermoscan 

fever.cough 

signs.of.flu 

how.long.does.flu.last 
normal.body, temperature 
get.rid.of.the.flu 


typ e. a. influenza 

i.have.the.flu 

taking.temperature 

flu. versus, cold 

bronchitis 

how.long.flu 

flu. germs 

cold. vs..flu 

flu.and.cold 

thermoscan 

flu.complications 

high, fever 

flu.children 

the.flu.virus 

how.to.treat.flu 

pneumonia 

flu.headache 

flu. cough 

ear.thermometer 

how.to.get.rid.of. the.flu 

flu. how. long 

symptoms.of.bronchitis 

cold. and. flu 

over.the.counter.flu.medicine 
treating.the.flu 


flu.care 

how.long.contagious 
fight.the.flu 
reduce.a.fever 
cure.the.flu 
medicine.for.flu 
flu.length 
cure.flu 
exposed.to.flu 
low.body 

early, flu.symptoms 
remedies.for.the.flu 
flu.report 

incubat ion.period, for.flu 
break.a.fever 
flu.contagious.period 
influenza.incubation.period 
cold, versus, flu 
flu.in.children 

what. to.do.if. you.have.the.flu 
medicine.for.the.flu 
flu. and. fever 
flu.lasts 

incubat ion.period, for.the.flu 
do.i.have.the.flu 


Table 7: All search phrases identified by Google Correlate using data as of 2010-05-22. 
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Figure 2: Dynamic coefficients for ARGO. Red color represents positive coefficients, blue color 
represents negative coefficients, white color represents zero, and grey color represents missing values. 
Missing values can be the result of (a) query terms not identified by Google Correlate and (b) Google 
Trends data not available for particular query terms. Black horizontal dashed line separates Google 
query queries from autoregressive lags. Yellow vertical dashed line separates coefficients trained on 
Google Correlate data from those trained on Google Trends data, and green vertical dashed line 
separates query terms identified on 2009-03-28 from those identffied on 2010-05-22. 
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